Text Extraction Algorithm using the HTML Logical Structure Analysis
نویسندگان
چکیده
منابع مشابه
Information Extraction from HTML Documents Based on Logical Document Structure
The World Wide Web presents the largest Internet source of information from a broad range of areas. The web documents are mostly written in the Hypertext Markup Language (HTML) that doesn’t contain any means for semantic description of the content and thus the contained information cannot be processed directly. Current approaches for the information extraction from HTML are mostly based on wrap...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملextraction-based text summarization using fuzzy analysis
due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. in this paperwe present a novel approach for creating text summaries. using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. the approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملConcepts Extraction based on HTML Documents Structure
The traditional methods to acquire automatically the ontology concepts from a textual corpus often privilege the analysis of the text itself, whether they are based on a statistical or linguistic approach. In this paper, we extend these methods by considering the document structure which provides interesting information on the significances contained in the texts. Our approach focuses on the st...
متن کاملExtracting Logical Hierarchical Structure of HTML Documents Based on Headings
We propose a method for extracting logical hierarchical structure of HTML documents. Because mark-up structure in HTML documents does not necessarily coincide with logical hierarchical structure, it is not trivial how to extract logical structure of HTML documents. Human readers, however, easily understand their logical structure. The key information used by them is headings in the documents. H...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Digital Contents Society
سال: 2015
ISSN: 1598-2009
DOI: 10.9728/dcs.2015.16.3.445